An undecidable property of context-free languages 
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Abstract 



I 1 ' We prove that there exists no algorithm to decide whether the language 

pLn ■ generated by a context-free grammar is dense with respect to the lexico- 

^ ' graphic ordering. As a corollary to this result, we show that it is undecid- 

O . able whether the lexicographic orderings of the languages generated by two 

context-free grammars have the same order type. 

> ' 

^ ; 1 Introduction 

O ' 

Suppose that E is an alphabet equipped with a (strict) linear order relation <. 
■ We may extend < to a lexicographic ordering <£ of E* by defining, for all words 

u, v € E*, u <£ v if either u is a proper prefix of v, or u = xay and v = xbz for 
some a,b G E and x,y,z € E* with a < b. Thus, when L C E*, then (L,<i) 
is a linear ordering. It is known (see e.g. [BE07[ ICour78aj ) that if the size of 
^ . E is two or more, then every countable linear ordering is isomorphic to a linear 

ordering (L, <i) for some language L C E*. Let us call a linear ordering regular, 
context-free, or deterministic context-free if it is isomorphic to the linear ordering 
of a language of the appropriate type. 

It follows by the characterization of regular and algebraic trees by their branch 
languages [Cour78al ICour78 b] that the regular (deterministic context-free) linear 
orderings are exactly those that can be defined by recursion schemes of order 
(order 1, respectively). See also [BE07| . Moreover, a well-ordering is regular 
if and only if its order type is less than w", and deterministic context-free if 
and only if its order type is less than cf. [BE10J. (These well-orderings 

have other characterizations using operations on well-orderings or automata, cf. 
[Del044 IKRS03] .) Moreover, it follows from results proved in [Heil80j that the 
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Hausdorff rank [Ros82j of every scattered regular linear ordering is finite. As 
shown in [BE09] , the Hausdorff rank of every scattered deterministic context-free 
linear ordering is less than uj w . Ordinals and scattered linear orderings defined 
by higher order recursion schemes are studied in [BC10J. 

It was shown in [Thom86] that it is decidable for regular linear orderings (given 
as lexicographic orderings of regular languages) whether they are isomorphic. 
The decidability status of the isomorphism problem for deterministic context-free 
linear orderings is open. Here, we show that it is undecidable for context-free 
linear orderings (given by context-free grammars) whether they are isomorphic. 
Moreover, we show that it is undecidable whether a context-free language defines 
a dense linear ordering. 

2 Linear orderings and context-free grammars 

A linear ordering [Ros82j is a set S equipped with a strict linear order relation 
<. In this paper, we restrict ourselves to linear orderings (S, <), where S is a 
countable set. A linear ordering (S, <) is dense if it has at least two elements and 
for any x,y S S with x < y there is some z with x < z < y. Two linear orderings 
(S 1 , <) and (S', <) are isomorphic if there is a bijection h : S — > S' such that 
xh < yh for all x,y G S with x < y. Isomorphic linear orderings have the same 
order type. It is known that up to isomorphism there are 4 dense (countable) 
linear orderings, the ordering Q of the rationals possibly equipped with a least 
or greatest element (or both). The order type of Q is denoted r\. 

A context-free grammar G over a (terminal) alphabet £ consists of a finite 
nonempty set N of nonterminals and a finite set of productions A — >■ u, where 
A £ N and u € (N U £)*. It is assumed that iV and S are disjoint. A nontermi- 
nal Aq, called the start symbol, is distinguished. The derivation relation =>* is 
defined as usual. For each nonterminal A, we let L(G,A) = {u G S* : A =>* u} 
denote the language generated from A. The context-free language L(G) C X* 
generated by G is L(G, Aq). We call G a prefix grammar if the languages L(G, A) 
are all prefix (or prefix-free) languages. A right linear grammar is a context-free 
grammar such that, except possibly for the last letter, each letter occurring in 
the word on the right side of a production is a terminal letter. It is well-known 
that a language is regular if and only if it can be generated by a right-linear 
grammar. For all unexplained notions on context-free grammars and languages 
refer to any standard book on formal languages. 

The reverse of a word u will be denoted u . 

Remark 2.1 It was pointed out by Luc Boasson that there is no algorithm to 
decide for a context-free grammar G whether it is a prefix grammar. Moreover, 
there is no algorithm to decide whether a given context-free grammar generates 
a prefix language. 
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3 Some undecidability results 

In this section our aim is to prove that it is undecidable for a context-free (prefix) 
grammar G over a 2-letter alphabet whether or not (L(G), <Q) is a dense ordering, 
or a linear ordering isomorphic to the ordering Q of the rationals. It follows from 
this result that it is undecidable whether or not the lexicographic orderings of two 
context-free languages, given by context-free (prefix) grammars, are isomorphic. 
In our proofs, we will use reduction from the Post Correspondence Problem 
(PCP). 

Let (a, f3) be an instance of PCP, where a = (a±, . . . , a n ) and j3 = (/3i, . . . , j3 n ) 
are nonempty sequences of nonempty words over the two-letter alphabet {a, b}. 
Then consider the alphabet 

r = {1, . . . ,n,a,b, ct, $}, 

ordered as indicated. For convenience, we will also refer to the elements of 
T by the letters c±, C2, . . . , c n+ 4 with c\ denoting 1, C2 denoting 2, etc. For 
j = l,...,n + 2, define Aj as the 3-letter alphabet {djo, dji, dj 2 } and extend 
the linear order on T to a linear ordering of the set 

n+2 

A = ru(jA j 

3=1 

so that 

Cj < djo < dji < dj2 < Cj + i 

for all j = 1, . . . ,n + 2. Note that A contains 4n + 10 letters and there is no 
"extra letter" between <t and $. 

We will construct a (prefix) grammar G = G a ^ over the alphabet A such that 
(L(G), <i) is dense if and only if (a, (3) has no solution. The grammar G will be 
designed so that it will generate the language 

L = L a U Lp U Li U . . . U L n+2 

where 

1. L a = {%]_... imia^ ■ ■ ■ aj m ) _1 (t : 1 < ik < n, m > 1} 

2. Lp = {h . . . i m {Ph ■ ■ ■ A m ) _1 $ : 1 < h < n, m > 1} 

3. Lj = {1, . . . , n, a, b}*Qj, where Qj = {d j0 , dj 2 }*dji, j = 1, . . . , n + 2. 

Note that each Qj and each Lj is a dense regular language whose order type 
is r], the order type of the rationals. The same fact holds for the languages 
Q = U j=i Qj an d L' = Uj=i Lj , since the order type of any finite nonempty 
sum Yliei Pi °f linear orderings Pi of order type r\ is also rj. 
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The grammar G has start symbol S and contains the following productions in 
BNF: 



s - 


■»■ A$\B$\C 


A - 


-> iAa^ 1 ia^ 1 


B - 


+ iBp-'lip- 1 


C - 


■»■ iC | aC | 6C 


C - 


■»■ Di | . . . | D n+2 


Dj 





It is clear that G is a prefix grammar. 

Proposition 3.1 (L(G aj p),<e) is dense if and only if (a, j3) has no solution. 

Proof. Assume that solution of (a,/3). Let u = (a^ . . . aj m ) 1 = 

(&...&J- 1 . Then 

u a = i\ . . . i m u§ and up = i\ . . . i m u% 

are in L. However, there is no word d in L with 

n Q <<; u up, 

showing that L is not dense. 

Suppose now that (a, (3) has no solution. We show that L is dense. To this end, 
suppose that u,v G L with u <£ v. Since L is a prefix language, -u and v can be 
decomposed as 

u = wen' , i> = iwcZv' 

where c and d are letters with c < d. It is not possible that c = $ and <i = $, 
since otherwise we would have v! = v' = e and the maximal prefix of w that is 
in {1, ... , n}* would give a solution of (a, (5). 

Thus, either c G Aj or c = Cj for some i = 1, . . . , n + 2. There are three cases to 
consider. 

1. c € Aj for some i = 1, . . . , n + 2, so that cu' € If <i is also in A.;, then 
(if' £ an d since cv! <£ dv', there exists some x € Qi with cu' <£ x <£ dv' 
and thus u = wcu' <£ wx <£ wdv 1 , where wx is in L. If d Aj then choose 
any word x £ Qi with cu' <£ x. We again have u = wcu' <£ wx <£ wdv' 
and wx G L. 

2. d € Aj for some i = 1, . . . , n + 2. This case is symmetrical to the previous 
case. 

3. Thus the only remaining case is when c = q for some i = 1, . . . , n + 2 and 
c2 = Cj for some j = 1, . . . , n + 4 with i < j. In this case let x be any word 
in Qi. We have that u = wcu' <£ wx <£ wdv' and wx € L. 
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Thus, we have shown that if (a, j3) has no solution, then between any two words 
of L there is a third word of L, completing the proof of the fact that L is dense. 

□ 

Remark 3.2 The language L = L(G a p) generated by the above grammar G a R 
has no least or greatest element with respect to the lexicographic order. Indeed, 
if v € U , then there exist words u,w E L' with u <£ v <£ w since the order type 
of V is r\. Now consider a word v = i\ . . . i m {<Xii ■ ■ ■ a im) _1 ^ ' in L a - Then let 
u = i\ . . . i m l(aij . . . a.i m ai)~ l Q and let w = d^i or any other word in Q^. We 
have that u <£ v <£ w and u,w € L. Similarly, if v = i\ ... i m {Ph ■ ■ ■ fiim) § 
is in Lp then u <e v <i w for the words u = i\ . . . i m l(/3j 1 . . . fii m Pi)~ l % and 
w = di x x in L. 

We order the binary alphabet {0, 1} by < 1. 

Theorem 3.3 There exists no algorithm to decide for a context-free (prefix) 
grammar G over {0,1} whether (L(G),<e) is dense. Moreover, there exists no 
algorithm to decide for a context-free (prefix) grammar G over {0, 1} whether the 
order type of (L(G), <e) is n. 

Proof. This follows from Proposition 13 , 1 1 and Remark l3.2l bv an appropriate order 
preserving coding of the letters of the alphabet A by words over {0, 1}* of length 
flog(4n + lO)] . D 

Theorem 3.4 There exists no algorithm to decide for a context-free (prefix) 
grammar G and a right linear (prefix) grammar G' over {0, 1} whether (L(G), <i) 
and (L(G'),<e) are isomorphic. 

Proof. Consider an instance (a, /3) of PCP and the grammar G = G a $ con- 
structed above. As before, let us code terminal letters by words of length 
|log(4n + 10)] by an order preserving coding. Thus, L(G) is a language over 
the alphabet {0,1}* such that the order type of (L(G),<g) is rj if and only if 
(a,j3) has no solution. Then let G' be the right linear (prefix) grammar with 
productions 

s -»• oos I US I 01 

generating the language {00, 11}*01 of order type r\. Then (L(G), <e) and (L(G'), <e 
) are isomorphic if and only if (a, j3) has no solution. □ 

4 Conclusion 

We have proved that there is no algorithm to decide whether a context-free 
grammar (even prefix grammar) generates a dense language with respect to the 
lexicographic ordering. As a corollary to this result, we have shown that it is 
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undecidable whether two prefix grammars generate languages of the same order 
type. 

We can prove that it is decidable in polynomial time whether the lexicographic 
ordering of the language generated by a prefix grammar is scattered, or a well- 
ordering. Moreover, we can extend the decidability part of this result to arbitrary 
context-free grammars. It is likely that a PTIME algorithm can be obtained for 
all context-free grammars. 
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